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STABILITY AND PERFORMANCE OF STOCHASTIC 
PREDICTIVE CONTROL 

DEBASISH CHATTERJEE AND JOHN LYGEROS 



Abstract. This article is concerned with stability and performance of con- 
trolled stochastic processes under receding horizon policies. We carry out a 
C^ ' ' systematic study of methods to guarantee stability under receding horizon 

policies via appropriate selections of cost functions in the underlying finitc- 

^—* , horizon optimal control problem. We also obtain quantitative bounds on the 

Cn ■ performance of the system under receding horizon policies as measured by the 

S—l ' long-run expected average cost. The results are illustrated with the help of 

d, several simple examples. 

ON 



1. Introduction 



With the steady growth in the availability of fast computing machines, control 
techniques that involve algorithmic selection of actions that minimize some per- 
formance objective have gained prominence. Receding horizon predictive control, 
which is based on such algorithmic selection procedures, has evolved over the years 
into one of the most useful and applicable control synthesis techniques currently 
available to a control engineer; see e.g., [11] for a survey of the modern theory 
00 ' and applications in the deterministic setting. Stochastic versions of receding hori- 

zon techniques initially evolved within the operations research community, see e.g., 
[6, 7], with inventory and manufacturing systems as primary application areas, and 
have steadily filtered into the domain of control systems, with current applications 
in financial engineering, process control, industrial electronics, power systems, etc. 

While the deterministic and robust versions of receding horizon control tech- 
niques have become standardized and are well-documented, the available literature 
l^ , on the stochastic version still lacks a comprehensive and systematic treatment. Es- 

•Oj ' pecially prominent in this regard is the matter of stability of control systems under 

stochastic receding horizon control; indeed, most of the literature does not appear 
to take advantage of the significantly developed and advanced results on stability of 
Markov processes. Chief among the reasons for this discrepancy between the deter- 
ministic and stochastic settings, perhaps, is the fact that the technical nature of the 
arguments involved in the stochastic version of stability is significantly heavier than 
its deterministic counterpart. Indeed, while the bare-essential arguments involved 
in establishing Lyapunov stability of discrete-time deterministic dynamical systems 
are only a few and are quite classical, the technical arguments and conditions in 
the theory of stability of Markov chains is by far larger in number, and constitute 
an active area of research even today. In addition to that, one has a diverse library 
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2 D. CHATTERJEE AND J. LYGEROS 

of notions of stability that are peculiar to the stochastic setting, and are simply 
non-existent in the deterministic or the robust setting. 

This article is an attempt at bridging this gap — we connect receding horizon 
control techniques to some of the principal elements of the theory of stability of 
Markov processes. Motivated by, and in the spirit of [12, §3], first we systematically 
develop a framework for studying stability of discrete-time controlled stochastic sys- 
tems under receding horizon policies. We critically examine two approaches in this 
connection, namely, ensuring stability by appropriate selection of the cost func- 
tions, and by adjoining an appropriate constraint to the underlying finite-horizon 
optimal control problem, before focussing on the former. Against the backdrop of 
certain standard (and no-so-standard) conditions for stability of Markov processes, 
we establish conditions on the cost functions such that these stability conditions 
are satisfied. Thus, this selection procedure, by design, ensures that the closed- 
loop system under the corresponding receding horizon control policy is stable. We 
utilize theorems on stability of Markov processes off-the-shelf as to this end. As 
such, the results pertaining to stability presented here should be regarded as rep- 
resentative guidelines — rather than offer a set of stand-alone results, we provide a 
general framework for establishing stability results. The details for specific appli- 
cations must be worked out on a case-by-case basis, as we illustrate through several 
examples. 

In addition, we develop a framework for analyzing the performance of the closed- 
loop systems under stochastic receding horizon control policies. Selecting a long-run 
expected average cost derived from the- underlying finite-horizon optimal control 
problem as our performance index, we provide quantitative bounds on this perfor- 
mance index under receding horizon policies and mild hypotheses. Observe that 
receding horizon policies are extracted from a finite-horizon optimal control prob- 
lem, and as such do not naturally offer any clue concerning the long-run expected 
average costs that they incur. The relationship between stability and performance 
is also explored here. In particular, we obtain a bound on the aforementioned per- 
formance index under a receding horizon policy that also ensures stability in an 
appropriate sense. 

The layout of this article is as follows: §2 provides the description of the control 
systems. Our results on stability under receding horizon control arc contained in §3, 
while performance bounds are provided in §4. Several examples illustrate our results 
throughout §3 and §4. The proofs of our results are provided in the Appendices §A 
and §B. The emphasis here is on conceptual clarity and a systematic presentation 
sans heuristics. The setting, insofar as the system, the associated receding horizon 
problem, and the results are concerned, is at an abstract level; this choice is targeted 
at conveying the key ideas in a transparently clear fashion, without the overload of 
excessive notation. In particular, the ideas presented here can be readily generalized 
to the setting of Markov decision processes; we choose to stay with simpler notation 
and technical requirements here. Numerical tractability of the underlying optimal 
control problems, which is an integral aspect of receding horizon control techniques, 
is not addressed here. 



2. System description 

Consider the discrete-time dynamical system given by the recursion 
(2.1) x t +i = f(xt,u t ,w t ), x given, t e N , 
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where 

o xt G M. d , u t G U C K m , and lot G W C M p are the states, the control actions, and 
the noise at time t; 

o f : R d x U x W — > R d is a measurable function; 1 

o U is the (nonempty) control set, assumed to be measurable and containing the 
element G E m ; 

o W C l p is assumed to be a measurable set; 

o (wt)t£N is the process noise — a W-valued random process with the Wt's inde- 
pendent and identically distributed. 2 

Let k be a positive integer. Recall that a k-stage (feedback) policy is a collection 
7To:fc-i := (7To,7Ti, . . . , 7Tfc_i) of measurable functions TTi : M d — > U for each i; we 
set the i-th control action as u t = ir t (xt). For the synthesis of control actions in a 
receding horizon fashion, we consider given: 

o a horizon N G N, 

o a cost-per-stage function c : R d x U — > [0, +oo[ and a final cost function cp : 

M. d — > [0, +oo[, both assumed to be measurable, and 
o a class LT of feedback policies. 

We introduce the TV-horizon value function 

riV-1 



(2.2) V n (x,tt):=EZ 



y^ c(xi,ui) +c F ( 



XNj 



where the policies 7r belong to the class II. 

Assumption. Without detailing the specifics, 

(Al) we assume sufficient regularity of the process (w t )teN such that the cost (2.2) 
is finite for all x G M. d and all tv G II. 

With these ingredients, the centerpiece of receding horizon control can be stated: 
It consists of the TV-horizon optimal control problem: 

minimize Vjv(x, 7r) 

(2-3) fneU, 

subject to < 

I dynamics (2.1). 

Assumption. In addition to (Al), we assume that 

(A2) the minimization problem (2.3) is well-defined for all x G M. d , i.e., for each 
boundary value x G M. d , there exists a policy it* G II that solves (2.3). 

Conditions under which (A2) holds are of a technical nature, and well docu- 
mented, e.g., in [6, Chapter 3]. 



Henceforth "measurability" on Euclidean spaces will refer to "Borel measurability." 

All random vectors are assumed to be defined on some underlying probability space, for which 

P(-) is the probability measure, and E[-] is the expectation under P. The assumption that wt's 

are independent and identically distributed can be substantially weakened at the expense of some 

notational clutter; we choose to stay with the simpler setting here. 

If ifi : K" — > R is a measurable function, then E^[ip(xt)] stands for the conditional expectation 

of ifi(xt) given xq = x, where xt is the state at time t under the policy 7r; P^ is the corresponding 

conditional probability. 
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We denote the optimal value function Vn(x,tt*) by V£(x) for all x e R d . The 
system (2.1) under the optimal policy generates the optimal state trajectory (x*)$L 
given by 

(2.4) x* t+1 = f(x* t ,TT$(x*),w t ), x* given, t = 0, 1, . ..,7V - 1. 

The technique of receding horizon control consists of applying the first element ttq 
obtained from the minimization problem (2.3) recursively, thereby generating the 
receding horizon policy 

(2.5) n :=«,<,...). 

To wit, given the state Xt at time t, one solves the minimization problem (2.3) with 
x = Xt, obtains the optimal policy ir*, applies the first element ttq of the policy, 
moves to time t + 1, and repeats the preceding steps. The system (2.1) under the 
policy -fr generates the state trajectory (xt)t^N via the recursion 

(2.6) x t +i = f{xt,no(x t ),w t ), x given, t e N . 

Observe that the process (xt)t£n generated by (2.6) is Markovian, i.e., the proba- 
bility distribution of the future state Xt+\ at time t+ 1 is conditionally independent 
of the past (x s )*=o given the present state Xt- Indeed, for S a Borel subset of M d , 
we have 

P(a*+i e S I (x s )U ) = P(f(x t} ir*(x t ),w t ) £ S | (x s )* =0 ) 
= P(/(a: t ,7ro(a;t),u>t) £ 5*|^t)- 
The following sections will study aspects of both qualitative and quantitative be- 
havior of the process (xt)teN generated by (2.6). 

3. Stability under receding horizon control 

Stability of the controlled process (xt)t£n generated by the recursion (2.6) is a 
desirable property in practice. There are two techniques in which stability can be 
ensured: 

(SI) By appropriate choice of cost functions: Stability of the controlled process 
(xt)tgNo can be ensured by an appropriate selection of the cost-per-stage func- 
tion c and the cost function cp. In the deterministic setting, conditions for 
asymptotic stability in terms of the cost functions are standard, see e.g., [12, 
§3] and [10] for details and further references. In the stochastic setting the aim 
is to arrive at a Lyapunov-like inequality in terms of the cost functions, which 
in turns ensures stability of the closed-loop system. While conceptually this 
technique leads to an elegant analysis, there are two points worthy of note: 
o For a control engineer, the selection of the cost functions c and c_f is typically 
dictated by the physics of the problem. In case the stability conditions are 
not satisfied by the natural candidates c and cf, the engineer may be forced 
to select cost functions that may have little to do with the particular physical 
aspects of the plant. 
o The applicability of predictive control is contingent upon numerical tractabil- 
ity of the finite horizon optimal control problem (2.3). The extent of flex- 
ibility in the choice of the functions c and cf is determined, therefore, by 
cases where numerically tractable problems can be derived from (2.3). In 
other words, applicability of this technique is limited by numerical tractabil- 
ity of the problem (2.3). However, as we shall illustrate through examples, 
more than one cost functions that ensure stability; there is, therefore, some 
freedom which the control engineer can utilize to suit numerical tractability. 



STABILITY AND PERFORMANCE OF STOCHASTIC PREDICTIVE CONTROL 5 

In the deterministic setting, standard stability conditions under receding hori- 
zon control require the existence of a stabilizing feedback controller inside a 
certain terminal set, such that the terminal set becomes invariant for the con- 
trolled system [12, §3]. In the stochastic setting, in general, even the weak- 
est forms of stability (for instance, positive recurrence, existence of invariant 
measures, etc,) require the existence of a certain "drift condition" outside a 
bounded set [13]. This difference between the two settings is generally un- 
avoidable for the lack of a notion analogous to invariance in the deterministic 
setting; see also Remark 7 for a more specific discussion. For instance, in sys- 
tems where there is non-zero probability of jumps in the state infinitely often 
and the magnitude of the jumps is not bounded, the notion of deterministic 
invariance does not make sense for any bounded subset of the state-space. Of 
course, this assertion does not apply to systems subjected to bounded noise 
where it may be possible to perform a robust analysis, but it does indeed ap- 
ply to the standard benchmark case of a linear control system with additive 
and independent Gaussian noise. 
(S2) By adjoining an appropriate constraint to the optimal control problem (2.3): This 
technique was first adopted in [1, 8, 9] in the context of receding horizon control 
of linear stochastic controlled systems. It consists of adjoining a constraint 
to the optimal control problem (2.3), so that the modified optimal control 
problem stays feasible for all x € K d , and the resulting receding horizon policy 
fr defined in (2.5) ensures stability. Observe that (i) the problem (2.3) where 
one intends to adjoin the constraint is limited to a finite-horizon, while (ii) 
the target of the constraint — attaining stability of the closed-loop system — 
involves a necessarily infinite-horizon notion. Two points to note: 
o It is imperative to ensure that the problem (2.3) with the new constraint is 
feasible for all boundary values x; this necessarily imposes restrictions on 
the type of admissible constraints, 
o Adjoining a constraint to the problem (2.3) potentially shifts both the opti- 
mal value and the optimizer -k* corresponding to the original problem (2.3). 
Therefore, a trade-off between the performance and a certain desirable qual- 
itative behavior of the closed-loop system may have to be accepted. 
As will be evident from the above discussion, a systematic development of the 
case (S2) is largely impossible due to the absence of a set of unifying objects 
inherent to the optimal control problem (2.3). Since the constraints do not, 
generally, depend on the cost functions, the details of the technique may differ 
significantly between specific applications 

In this article we focus on (SI); the relevant results are presented in §3.2. 
Preparatory to that, in §3.1 we briefly recall certain basic aspects of the general 
theory of stability of discrete-time Markov processes. 

3.1. Review of the general theory of stability of discrete-time Markov 
processes. The type of stability that we shall focus on here concerns boundedness 



of sequences of the form (E x [h(xt)]) . „ , for appropriate functions h : 



d 
[0, +oo [. For instance, consider h(z) = ||z|| p for p > 1. In view of the fact that 

/■+oo 

E^lkir] =W r^PxGxtH^dr, 
Jo 

boundedness of (E x [||a:{|| p ]) „ implies that the conditional probability, given the 

initial condition xq = x, of the states being at a distance r from the origin decays 



This identity is an immediate consequence of Fubini's theorem. 
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faster than r~ p as r grows large, uniformly over time t. In other words, we have 
an assertion corresponding to the behavior of the tail of the conditional probabil- 
ity distributions Pa;(||xt|| > r),t G No, uniformly over time t. The case of p = 2 
is especially prevelant in the literature, and goes under the name of mean-square 
boundedness. To understand the qualitative behavior of Revalued Markov pro- 
cesses (xt)tgNoi the general strategy consists of studying the behavior of sequences 
such as (E x [h(x t )\) for "norm-like" functions h, and drawing appropriate infer- 

ences concerning the former. 

Recall that the process (xt)teft generated by (2.6) is Markovian in view of 
(2.7). For discrete-time Markov processes, the theory of stability is extremely well- 
developed, (see e.g., [13] for a book-length treatment,) and most of the standard 
conditions for stability involve what is known as a "negative drift condition." 5 A 
generic negative drift condition takes the following form: 

(D) there exist measurable functions S : R d — > [0, +oo[ and T : M. d — > [0, +oo[, 
and a bounded and measurable set K C M. d , such that 

E x [E(xi)} - S(») < -T(a?) for all x & K. 

Depending on the properties of the functions H and T, it may be possible to assert 
the type stability of the process (xt)t£H - Observe that the condition (D) closely 
resembles Lyapunov stability conditions for deterministic discrete-time systems. 

Perhaps the most well-known drift condition is contained in the following: 

Proposition 1. Let (xt)te'Mo be a Markov process. Suppose that there exist (3 > 
and A G [0, 1[, a measurable function V : W l — > [0, +oo[, and a compact set 
K C M. d such that E x [V(a;i)] < X V(x) for all x <£ K, and sap x&K E^V («].)] = j3. 
Then E x [V(x t )} < X t V{x) + /3(1 - Ao)" 1 for all x G R d and t G N . 

The hypotheses of Proposition 1 imply 

(3.1) E x [V{xi)} - V{x) < -(1 - \o)V(x) for all x £ K, 

which is sometimes known as a "geometric drift condition." The condition (3.1) 
is strong — the expected value of the function V decreases by a fraction A in one 
step for all boundary conditions x outside a compact set. See e.g., [13] for further 
details, discussions, and applications of Proposition 1. 

Among the weakest drift conditions, we have the following: 

Proposition 2. Let (xt)teN be a Markov process. Suppose that there exist /3, M, e > 
0, a measurable function V : M d — > [0, +Oo[, and a compact set K C M. d such that 

(3.2) E x [V(xi)]-V(x) 4.-/3 forallxgK, and 

(3.3) E[\V(x t+1 ) - V(x t )\ 2+£ | (V(x s ))l =0 \ 4 M for all t G N . 
Then for each x G K d the sequence (E x [V(xt)]) t pN is bounded. 

Proposition 2 stipulates a constant negative drift (3.2) outside a compact set, as 
opposed to a geometric negative drift in (3.1). The condition (3.2) is rather weak, 
and the price for weakening the drift condition is the introduction of a uniform 
bound (3.3) on the jumps of the process (xt)teN - ln general, both the conditions 



General results dealing with stability but not relying on negative drift conditions are rare; 
one example may be found in [4]. 
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(3.2) and (3.3) are necessary, and the (2 + e) exponent in (3.3) is tight; see [14] for 
details and (counter-)examples. An application of Proposition 2 to control of linear 
systems may be found in [16], and to receding horizon control in [8]. 

Propositions 1 and 2 may be viewed as representing two extremes of the spectrum 
of stability results involving negative drift conditions. We refer the reader to [13] 
for other drift conditions and their corresponding assertions concerning stability. 
Proposition 2 also highlights some of the features peculiar to stochastic control — 
indeed, while in the deterministic setting, the drift (in terms of Lyapunov functions) 
needs to be merely negative definite to ensure global asymptotic convergence of the 
system, the stability assertions in the stochastic setting depend crucially on the 
functional nature of the drift in addition to other conditions. 

3.2. Stability under appropriate selection of cost functions. 

Assumption. In addition to (A2), we stipulate that 

(A3) there exist a measurable feedback control function g : M. d — > U, a number 
b ^ 0, and a measurable and bounded set K C K d such that 
((A3)-i) sup\c{z,g(z)) - c F (z) + E[c F o f(z,g(z),w )]\ sC b, 

((A3)-ii) c(z,g(z)) + E[c F of(z,g(z),w )] < c F (z) for all z & K. 

Observe that ((A3)-ii) is a negative drift condition in disguise: indeed, it is 
precisely 

E x [cir(a;f)] — c F {x) ^ —c(x,g(x)) for all x $ K, 
where xf :— f(x,g(x),wo)', here the cost functions c F and c play the roles of the 
functions E and T in (D), respectively. ("Global" conditions, similar in spirit to 
((A3)-ii), in the context of stochastic receding horizon control have been proposed 
recently in [15].) However, since the stabilizing feedback g is not necessarily iden- 
tical to ttq, the condition ((A3)-ii) does not guarantee stability under the receding 
horizon control policy it. Nevertheless, from the boundedness condition ((A3)-i) 
and the drift condition ((A3)-ii), both expressed in terms of the cost functions c 
and c Fl we can establish the following drift condition involving the optimal value 
function Vjy corresponding to (2.3), and under the receding horizon policy 7r: 

Theorem 3. Consider the controlled system (2.1) with its accompanying data, and 
the optimal control problem (2.3). Suppose that Assumption (A3) holds. Then 

(3.4) foranyxeR d , E X [V£(^)] - V&(x) < -c(x,^(x)) + b, 

where (x£) ( _q is the sequence generated by the recursion (2.4). In particular, under 
the receding horizon policy tt derived from (2.3), the closed-loop process (xt)tew 
generated by (2.6) satisfies 

(3.5) for any xeR d , E*[V^(a; 1 )] -V$(x) < -c(x,^{x)) + b. 

Theorem 3 is the first of our two main results, and it is a representative statement 
aimed at establishing a connection between receding horizon control and stability 
of Markov processes. Observe that even though (3.5) does not resemble a negative 
drift condition per se, a condition analogous to (D) can be extracted from (3.5) 
under appropriate assumptions on the function c. 6 Once such a procedure has been 
carried out, one can apply appropriate results on stability of Markov processes, e.g., 



Of course, (3.5) may not lead to a negative drift outside a bounded set, e.g., if the function c 
is bounded above by b, and even if a negative drift condition can be extracted from (3.5), it may 
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Propositions 1 or 2, to assert boundedness of the sequence (Ej[V^(a;t)]) . We 
shall illustrate applications of Theorem 3 through our results and examples in the 
sequel. 

In general, analytical expressions of the optimal value functions V^ are difficult 
to obtain; however, V^ can be bounded above and below in terms of the final cost 
cf and the cost-per-stage function c, respectively, as follows: 

Proposition 4. Consider the controlled system (2.1) with its accompanying data, 
and the optimal control problem (2.3). Suppose that Assumption (A3) holds. Then 

c(x,ttq(x)) + (N — 1) inf c(z, u) 

(3.6) for all xER d , r , , 

+ inf E[c F (f(z,u,w ))]^V£{x)^c F (x)+Nb. 

(z,u)eR d xU 

Proposition 4 can be employed in conjunction with Theorem 3 to arrive at sta- 
bility conditions under additional hypotheses: 

Proposition 5. Consider the controlled system (2.1) with its accompanying data, 
and the optimal control problem (2.3). Suppose that Assumption (A3) holds. As- 
sume further that the cost functions satisfy: 

o lim Cf(z) = +oo, 

ll*ll-H-°° 
o there exist measurable functions c s : M. d — > [0, +oo[ and c c : U — > [0, +oo[ such 
that 

c{z,v) = c s {z) + c c {v) for all (z,v) eK^xO, 
and 

lim c s (z) = +oo, 

||z||->+oo ' v 

o there exist a constant a E [0, 1[ and a compact set K C M. d such that 

Cs(z) Js cxcf(z) for all z ^ K. 

Then under the receding horizon policy n, the function Vj^ satisfies a geometric 
drift condition outside some compact subset of M. d . In particular, for each x E M. d 
the sequence (Ej[V^(a; t )]) N is bounded. 

Example 6 (The LQ problem). Consider the controlled system (2.1) with f(x, u, w) = 
Ax + Bu + w for matrices A G R dxd and B E R dxm . Let U = K m and W = R d . Let 
wo have a continuous density on M. d , E[u>o] = 0, E[wow^] = S for some non-negative 
definite and symmetric matrix £ E M. dxd . Assume that the pair (A,B) is stabiliz- 
able. Then, by [2, Proposition 11.10.5], for every symmetric and positive definite 
matrix Q E M. dxd there exists a matrix K E W mxd and a symmetric and positive 
definite matrix P E R dxd such that {A + BK) T P(A + BK) - P = -Q. Consider 
the policy tt = (g,g, . . .), where ]R d 3 x i — > g(x) := Kx E M m , and define the 
function l d 3i i — > V(x) := x T Px E [0, +oo[. Then it follows from Proposition 1 
and standard arguments that the closed-loop process (xt)teN under the policy tt 
is stable in the sense that E x [V(x t )} < KV( X ) + Pi 1 - A o) _1 for all t E N , 

A _ 1 L cr min (Q) 



2 V ^max(P), 

K = {z EM. d \x T Px^ f-tracc(PE)}, 



not be possible to assert boundedness of the sequence (EJ V^(xt) ) , e.g., if the conditions 

(3.2) and (3.3) do not hold simultaneously. 
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/3 = sup{x T (^l + BK) T P(A + BK)x + trace(PE)}, 

zeK 

where <7 m i n (M) and o~ max (M) denote the minimal and maximal singular values of 
a matrix M. 

In view of the above computations, we define a symmetric and non-negative 
definite matrix R g R mX ™ such that K T RK ^ Q, where the relation "$C" between 
the preceding matrices denotes the standard matrix partial order among symmetric 
non-negative definite matrices. Let us define cost functions c(z, u) := (l — a)z T Qz + 
au T Ru and c F (z) := z T Pz for (z,u) E R d x R m and a e [0, 1]. Straightforward 
calculations show that ((A3)-i) and ((A3)-ii) hold with g(z) = K z and the preceding 
definitions of c, cp, /3, and the compact set K. Consider now the optimal control 
problem (2.3) for a given N G N and the control set U = K m . By Theorem 3, we 
see that a receding horizon controller derived from this optimal control problem 
ensures that (3.5) holds. It is also possible to verify the hypotheses of Proposition 
5 in this case, which implies that for each x € K d the sequence (Ej[V^(x t )]) 
is bounded. A 



While Example 6 is entirely standard, it highlights a few noteworthy features of 
control of linear systems with affine noise, summarized in the following: 

Remark 7. 

a) In the context of linear systems, the condition (3.1) implies that at all states 
x of large enough norm, the control action must be strong enough to achieve 
this geometric decrease. In the absence of a bound on the magnitude of the 
control actions, it is possible to synthesize linear feedback policies, such that 
a geometric drift in terms of quadratic functions V is attained; for instance, 
consider the feedback policy (<?,<?, . . .) with g(x) = Kx in Example 6. 

b) If the control actions are bounded, a control policy whose elements are linear 
maps of the states is inadmissible. In this case, if the noise has unbounded 
support, e.g., Wt is a zero-mean Gaussian with a given variance matrix, then 
the following four cases appear naturally: 

> if the system matrix A has an eigenvalue outside the closed unit disc, with 
no control policy is it possible to ensure a geometric drift condition with 
quadratic Lyapunov functions V; 

> if all eigenvalues of A are inside the open unit disc, then irrespective of the 
feedback policy, a geometric drift condition can be found for a quadratic 
Lyapunov function V, as illustrated in Example 6; 

> if A is Lyapunov stable, a constant (as opposed to geometric) negative drift 
condition for the Lyapunov function V(z) := \\z\\ was demonstrated in [16]; 
we provide a geometric drift condition under the same setting in Proposition 
8 below; 

> if A is has eigenvalues on the unit circle but with unequal algebraic and 
geometric multiplicities, the problem of stabilization under bounded controls 
remains an open problem; see [5] for details. 

c) Consider the scalar version of Example 6 with (wt)t^N & sequence of mutually 
independent standard normal random variables. Since 



P inf w t = — oo and sup w t = +oo 1=1, 

it is impossible to assert almost sure convergence of the states to any compact 
set under any policy. For the same reason, it is also impossible to assert a 
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statement of the form 

P(3 t G N, K C ffi. compact, such that x t G K for all t > to) > 1 - £ 

for e G ]0, 1[ preassigncd. In other words, under any feedback policy, almost 
surely, there will be excursions of the states beyond any given compact set 
infinitely often over an infinite time horizon. An identical assertion carries over 
to the multidimensional case. <3 

Preparatory to providing further examples illustrating Theorem 3, we establish 
a stability result for a particular class of linear systems, as promised in Remark 
7-b): Consider the controlled system 

(3.7) Xt+i = Ax t + Bu t +Wt, x given, t S No, 

for given matrices A G R dxd , B G M dxm , and suppose that (wt)teN is a se- 
quence of independent and identically distributed (i.i.d.) random vectors. Suppose 
that pair (A, B) is controllable with reachability index k. 7 We define 1Z(A, M) := 
(A K_1 M • • • AM M) for a matrix M with d rows. For a matrix M we let M + 
denote its Moore-Penrose pseudoinverse. Let ?7 m ax > be given, and suppose that 
\\ u t\\ ^ U max for all f e No. For r > 0, we define the radial saturation function 
W l 3 z i — > sat r (z) := min{r, ||z||}|rf|r H z ^ and otherwise. Let Id denote 
the d x d identity matrix. The following proposition proposes a bounded control 
function such that a geometric drift condition is attained: 

Proposition 8. Consider the linear controlled system (3.7) with its accompanying 
data, suppose that A is orthogonal, and that wq is Gaussian with mean G 

exp 



given variance S G R dxd . Let p := In E expl 1Z(A, 1^) (wq ■•■ u>J-i) 

Suppose that C/ max > p, define V(x) := e^H {or x G R d , and let K := {z G M. d \ 
\\z\\ ^ 2p}. Then under the control actions 

(3.8) : \:=-Tl{A,B) + sat Unm {A K x Kt ), t e No, 

\u K (t+n-i) 

the closed-loop process {x K t)t^n a is Markovian, and there exists A G]0, 1[ such that 

E[V(x K ( t +i)) I x Kt ] < XoV{x Kt ) on the set {x Kt <£ K}. 
In particular, for each x G M. d the sequence (Exfe" 1 *"]). „ is bounded, and the 
conditionally probability distributions P x {\\%t\\ > r )it G No, have exponentially thin 
tails uniformly over t S No- 

Proposition 8 is of independent interest. Stability of (3.7) under bounded con- 
trols was considered in [16], where the authors demonstrated that the same control 
actions as in (3.8), but under weaker assumptions on the noise, 8 led to a constant 
negative drift of the function |-| outside a certain compact set of the closed-loop 
sub-sampled process (x K t)*eN - The technical tools in [16] relied on the consid- 
erably involved results of [14]; in contrast, the proof of Proposition 8 that we 
provide here relies only on the basic Proposition 1 — namely, a geometric drift con- 
dition expressed in terms of e"". Note that, Proposition 8 asserts boundedness of 
(Ea; [e" 35 *"]) „ , which is a stronger statement compared to, and indeed implies, 

boundedness of (E x [||xt|| ]) tgN asserted in [16]. Compare, in particular, that the 



7 That is, rank (_B AB ■■■ A K ~ 1 B) = d. 
To be precise, it was assumed that sup t6N E[||u>t|| ] < +oo. The result in [16], therefore, 
applies to noise sequences (lUt)tgNo more general than Gaussians. 
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conditional distributions P x (||a;t|| > r) have exponentially thin tails, by Proposi- 
tion 8, if the maximum magnitude of the control actions is large enough, while the 
main result of [16] asserts that the corresponding distributions have tails that decay 
faster than inverse quadratically as r grows large. 

Example 9. Consider the controlled system (2.1) with f(x, u, w) = x + u + w, where 
x,u,w G R. Let wq have a continuous density on R, E[wo] = 0, E[|iuo| ] < +oo. 
Fix TV G N. Let us investigate the possibility of stability under a receding horizon 
policy derived from the TV-horizon optimal control problem 9 

rJV-l 



minimize EZ 



Y^ 1 i\h2,2](i() + \\xn\\ 
t=o 



^• y J [7rGiI where LT a class of Markovian policies, 

subject to < in(z) G [— 1, 1] for all i = 0, ... ,7V — 1 and zel, 
[dynamics (2.1) with f(x, u,w) = x + u + w. 

The cost-per-stage function c(z,u) — 1r\[_2.2]( 2: ) ensures that for each realization 
of the random noise, the cost grows proprtionatcly to the duration that the state 
stays out of the set [—2, 2]. The policy that solves the minimization problem (3.9) 
drives the state inside [—2, 2] as fast as possible, and the final cost cf regulates the 
final state close to 0. In the light of Proposition 2 (e.g., following the arguments 
in [16],) it is not difficult to verify that the policy {g,g, ■ . .), with g(x) = — sat(x) 
ensures that 10 

E[|xt+i| I Xt\ — \x t \ ^ —1 on the set {\xt\ > 2}, t G No- 

Note that the closed-loop system is Markovian. One sees after standard computa- 
tions that the optimal control problem (3.9) with K := {z G R | \z\ sj 2} verifies 
both ((A3)-i) and ((A3)-ii). The selection of the cost-per-stage function c is not 
unique; e.g., c(z,u) = \[}-w\+2,2\{ z ) + l[_i.i]\[-A,A]( u )) works just as fine insofar 
as the matter of satisfying ((A3)-i) and ((A3)-ii) is concerned. Numerical tractabil- 
ity of the problem (3.9), even under the assumption that {wt)teN i s a sequence 
of independent and identically distributed standard normal random variables, is a 
non-trivial matter. A 

Example 10 (Example 9 continued). Consider the controlled system (2.1) with 
f(x,u,w) = x + u + w, where x, u, w G K. Let (wt)teN De a sequence of i.i.d. 
Gaussian random variables with zero- mean and variance a 2 for some given a > 0, 

and define p := ln(cri/|-). Let C/ max > p be given, and define A := e p_!7max . Fix 
N G N. Let us investigate the possibility of stability under a receding horizon 
policy derived from the A-horizon optimal control problem 



minimize EZ 



N-l 

(1 - A ) ]T el«« 
t=o 



(J-iv) [ 7r G LT where II a class of Markovian policies, 

subjectto < \iti(z)\ < Umax for all i = 0, . . ., N — 1, z G M, 

I dynamics (2.1) with f(x, u,w) — x + u + w. 

Namely, we have the final cost cp(z) = e' 2 ' and the cost-per-stage function c(z, u) = 
(1 — A )e' 2 '. By Proposition 8 it follows that ((A3)-ii) holds for some compact 



Here Ia(') denotes the indicator function of the set A, defined as 1a( z ) = 1 if z € A and 
otherwise. 

Here sat(-) is the standard saturation function defined as sat(z) = z for z 6 [—1,1], 1 if 
z > 1, and —1 otherwise. 
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K C R, and it is straightforward to verify ((A3)-i). Theorem 3 guarantees that 
a receding horizon controller derived from (3.10) ensures (3.5). An extension to 
the multidimensional case can be readily obtained with the technical support of 
Proposition 8. It also follows from standard computations that the hypotheses of 
Proposition 5 hold in this case, which implies that for each x €E K the sequence 
(E*[V£(a:t)]) is bounded. ' A 



4. Performance under receding horizon control 

In this section we study performance of closed-loop systems under receding hori- 
zon control. Our objective here is to arrive at quantitative bounds on the perfor- 
mance of receding horizon policies over an infinite temporal horizon. 

To this end, we must first select a performance index. We contend that the cost- 
per-stage function c is a natural candidate with which performance at each time 
step may be measured. For one, while the final cost cf plays an important role 
in the problem (2.3), the first element 7Tq of the optimal control policy it* enters 
the function c but not cf', since a receding horizon policy is constructed out of ttq, 
the function c is perhaps a more natural candidate compared to cf for measuring 
performance at each time step. Moreover, the function c involves both the states 
and the control actions, while cp involves only the states; as such, the expected 
value of c(xt, ut) at time t reflects the performance measured with respect to both 
the states and the control actions. 

In the setting of the dynamical system (2.1) involving stochastic uncertainties, 
typically, a sum of cost-per-stage functions over n steps grows with n rather quickly, 
and the expected total cost 

— (-oo 

E x y~]c(xt,u t ) 

-t=0 

may not be suitable for measuring performance. For instance, in the case of the 
standard optimal linear quadratic regulator, under the optimal policy the quantity 
E x [^2™ =0 {xjQxt + ujRutj] grows linearly with n (under mild assumptions on the 
non-negative definite states- and control- weight matrices Q and R, respectively); 
consequently, the expected total cost over an infinite horizon is not bounded. A 
more appropriate measure of performance is the long-run expected average cost, 
measured in terms of the cost-per-stage function c, defined by 



(4.1) limsup 



1 



^2c(x t ,u t ) 

t=0 



This is the performance index that we adopt here. In particular, the quantity in 
(4.1) is well-defined for the linear quadratic problem under mild hypothesis (stabi- 
lizability of the underlying linear system). The intuition is clear: the quantity in 
(4.1) measures the cost averaged over time and averaged across all possible realiza- 
tions of the process (xt)teN - Observe that all phenomena that occur over a finite 
temporal horizon, or are transient in the sense that they asymptotically die out, do 
not affect the index (4.1). 

We next provide our second main result — an estimate of performance under the 
receding horizon policy 7r in terms of the long-run expected average cost. It also 
shows how stability of the closed-loop process influences the long-run expected 
average cost. 
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Theorem 11. Consider the controlled system (2.1) with its accompanying data, 
and the optimal control problem (2.3). Let ft := (tTq,tTq, . . .) denote the receding 
horizon policy derived from (2.3), and for a measurable feedback g we define 

T><1 



(4.2) T- g (z):=c(z,g{z)) 

Then the following hold: 



c F {z) + E[c F o f(z,g(z),w )]. 



z G 



(11. i) If Assumption (A2) holds, then for every x G R d and k G N. 



E n 



(4.3) 



J2c(xe,ue) ^V*(x)-EZ[V*(x k+1 )] 

k 

+ £E£[E**[T s (a* + jv)|a*] 



e=o 



(11. ii) If Assumption (A3) holds and 
(4.4) i.-- E "[^( x " 



lim 



/or aH x G IT, 



then 



r A- 



(4.5) 



lim sup E J 

fe^+oo k + 1 



^c(a; £ , 



»/, 



«=0 



$J o /or a// x G 



In particular, if for all x G K'' the sequence \E^\V^(xt)T) t( - n is bounded, 
then (4.5) holds. 

Remark 12. 

a) The estimate (4.3) in part (11. i) is quite general, and holds under the mild 
hypotheses (A2) of a technical nature. The part (11. ii) follows from (11. i) under 
the additional Assumption (A3), as will be evident from the proof of Theorem 
11 presented in Appendix B. 

b) We reiterate that the bound on the expected average cost in (4.5) corresponds 
to the receding horizon policy ft, not the stabilizing policy (<?,<?, . . .). Indeed, 
although Assumption (A3) stipulates the existence of a stabilizing feedback g, 
this feedback controller is never applied. From the proof of Theorem 11 one sees 
that the condition (3.5) in Theorem 3 plays a crucial role in (11. ii). 

c) The condition (4.4) is technical in nature. In most practical cases, one ensures 
boundedness of (EJ [Vjv(^)]) ipN with the help of Theorem 3, and consequently 
(4.5) holds. 

d) Theorem 11 provides a quantitative bound on the performance measured in 
terms of the cost-per-stage function c. Since the stability conditions presented 
in §3.2 also involve the function c, it is possible to establish a connection between 
the performance bound (4.5) with the stability conditions in Assumption (A3). 
In general, establishing such a connection may not be possible if stability under 
the receding horizon policy ft is ensured by means of adjoining an appropriate 
constraint, as discussed in (S2) above; indeed, such a constraint may have no 
relation whatsoever to the cost functions c and cf- Nevertheless, the inequality 
in (4.3) holds irrespective of whether Assumption (A3) is satisfied or not. Con- 
sequently, if a constraint is adjoined to the problem (2.3) such that the process 
{xt)teN under the receding horizon policy ft satisfies (4.4), and the sum on the 
right-hand side of (4.3) increases at most linearly with k, then a bound on the 
long-run expected average cost can be extracted. Naturally, these verifications, 
which are likely to be case-specific, need additional analysis. 



14 D. CHATTERJEE AND J. LYGEROS 

c) A direct optimal control problem involving minimization of the long-run ex- 
pected average cost criterion for dynamical systems, e.g., 



minimize lim sup EJ 

n->+oc 71+1 

(4.6) 

subject to < 

I dynamics (2.1), 



4=0 



is generally difficult to solve both analytically and numerically — see [6, Chapter 
5] for further details. The numerical value of the bound b in (4.5) on the perfor- 
mance index (4.1) under the receding horizon technique may be employed as a 
measure to decide whether to adopt a receding horizon strategy as an alternative 
to a direct solution to the minimization problem (4.6). < 

Example 13 (Example 6 cont'd). Consider the linear system in Example 6 and the 
selection of cost functions c and cf as in the final part of Example 6. Standard 
computations, e.g., as in [6, Chapter 3], show that the function VJy is quadratic. 
In conjunction with the final calculations in Example 6, this shows that (3.5) is 
a geometric drift condition in this case. Proposition 1 implies, therefore, that 
under receding horizon control derived from (2.3), for each x € M. d the sequence 
(Ej[V^(x n )]) „ is bounded. Finally, by Theorem 11 part (11. ii) we see that for 
each x e R d 



lim sup E J 

n— s-oo 71 + 1 



^2c(xt,u t ) 



=(] 



</3, 



where /3 is the constant defined in Example 6. A 

Example 14 (Example 9 cont'd). A direct application of Proposition 5 does not 
appear to be possible. However, one can verify that for each x £ M. d the sequence 
(Ej[V^(a;t)]) N is bounded by following the arguments in [16] with the support 
of Proposition 2. This in turn implies by Theorem 11 that the long-run expected 
average cost is finite under the receding horizon policy extracted from (3.9). A 

Example 15 (Example 10 cont'd). In this case it is possible to directly apply Propo- 
sition 5; we skip the standard computations needed to verify its hypotheses. Con- 
sequently, the sequence (Ej[V^(x t )]) is bounded. By Theorem 11 this implies 
that the long-run expected average cost is finite under the receding horizon policy 
extracted from (3.9). A 



Appendix A. 
This appendix collects our proofs of the results presented in §3. 

Proof of Proposition 1. A proof of this proposition is standard; we include it merely 
for completeness. The assertion follows from the Markovian property of (xt)ten 
and an iteration scheme as follows: 

E x [V(x t )} = E x [E[V(x t )\xt-i}] 

^ E x [X V(xt-i)lRd\ K (x t -i) + f3l K (x t -i)] 
< XoE^Vixt^)] +pP x (x t -i € K) 
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< XlV(x) +/3^A*- 1 " fc P :c (x fc e K) 



^ XlV(x) + 



fc=0 



1-Ac 



D 



Proof of Proposition 2. This is a special case of [14, Theorem 1] 



□ 



Proof of Theorem 3. From the last N — 1 elements of the optimal policy it* we 
derive the A-lcngth policy tt := (ir*, . .. ,TT^ f _ 1 ,g), where g : R d — > U is the 
feedback function in Assumption (A3). Let Vn(x) := Vn(tt,x). Recall that the 
sequence (Xf) t J^ ) 1 is generated by the recursion (2.4). Then, by optimality of the 
policy 7r*, 

E x [V^x{)]-V^(x) 

^E x [V N (xl)]-V*(x) 



rJV-l 

E 



c(x* 7 irt{x* t )) + c(x* N ,g(x* N )) + c F (f(x* N ,g(x* N ),w N )) 



rJV-l 



Y^c{x$,ir${x* t ))+c F (x* N ) 



L t=0 



= ~c(x, ttq (x)) + E x c(x* N ,g(x* N )) + c F o f(x* N , g(x* N ), w N ) - c F (x* N ) 

If the condition (A3) holds, then the tower property of conditional expectations 
implies that 



c(x* N ,g(x* N )) + c F o f(x* N ,g(x* N ),w N ) ~ c F {x* N ) 



= E X 



E[c{x* N ,g{x* N )) +c F o f(x* N ,g{x* N ),w N ) - c F (x* N ) \ {x^}f =1 



&1 



{x* N dK} 



= bP x (x* N e K). 

Substituting back we obtain 

E*[V$(a$)] - V£(a;) < -c{x,^ Q {x)) + b, 

which proves (3.4). The assertion (3.5) follows from the facts that fr = (ttq, ttq, . . .), 
and that the closed-loop process is Markovian under ft. □ 



Proof of Proposition 4- Fix x € M. d . The first inequality c(x,ttq(x)) ^ V£(x) fol- 
lows immediately from the definition of V£. We now prove the second inequality. 
Let the sequence (x 9 )^ =1 be defined by 

g \ x if t = 0, 

Xt = \f(x 9 _ l ,g(x 9 _ 1 ),w t _ 1 ) i£t = l,...,N. 

In view of ((A3)-ii), we have 

c F (x) - Vjf(x) = c F (x) -c(x,g(x)) -Vjf(x) +c(x,g(x)) 

> E[c F o f(x,g(x),w )] -Vj!f(x) + c(x,g(x)) -b 
= E[c F (xf) - c(x{, g(x{)) \xl=x] -V£(x) 

+ E[c(xf,g(x?)) \x 9 = x] +c(x 9 ,g(x 9 )) -b 



1<> 
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>E[c F (x g 2 )\x a =x]-V£(x) 
1 
+ Y / E[c(x a e ,g(x 9 ,))\x s Q = x] -2b 



e=o 



^ E[c F (x 3 N ) | x 3 = x] - y£(s) - iV6 

JV-l 

+ 5Z E [ c (4'fi , (4))| a; o =»]■ 



c=o 



TV times 



Since (g, . . . ,g) is a sub-optimal policy for the optimal control problem (2.3), 



V£(a:) < E 



N-l 



Y^ c.{x 9 t ,g{x 9 e )) + c F {x%) 



1=0 



„9 _ 



and therefore, cf(x) — V^(x) ^ — Nb. Since a: is arbitrary, the assertion follows. □ 

Proof of Proposition 5. Fix x G Mr . We know from Theorem 3 that 

E£[V£(si)] -V£(s) < -c(x,7r$(x))+&. 
By hypothesis, c(x, 7Tq(x)) = c s (x) + c c (7Tq(:e)) Js c s (x), and if cc g" A, then c s (x) Js 
aCi?(x). Thus, 

Ej[V^(»i)] - V£(x) < -c s (x) + 6 < -ac F (a;) + 6 if x $ K. 

In view of (3.6), we see that if x A', then — acp^) + 6 ^ — aVJJ(i) + 6(1 + aJV). 
In other words, 

E x [Viv^i)] ~ V$(ar) < -aV£(x) + 6(1 + aiV) for all x £ A". 

Since lim|| 2 ||_ >+00 c s (z) = +oo, our hypotheses show that lim|| z ||_>. +00 c(z, ttq(z)) — 
+oo, and from (3.6) it follows that limiui|_y +00 VJy(z) = +oo. By definition of a 
limit, therefore, there must exist an closed ball A' around G R d of radius large 
enough, such that Vjy(z) ^ 2(a~ 1 + N) for all z G" A'. Substituting back we see 
that 

E* [V£(si)] - V*(x) < -|V£(s) for all x G' A', 

which is a geometric drift condition outside the compact set A'. The particular 
case follows from Proposition 1. □ 

Proof of Proposition 8. Fix £ G No- The state recursion (3.7) shows that 

/ u Kt \ ( w Kt 



<t+1) = A K x Kt + K(A, B) 



■n(A,i d ) 



\«R(t+l)_ly 

A K x Kt + K(A, B)Tr- t + K(A, I d )ufe. 



\w K 



(t+l)-ly 



Let A := p — C/ max and let A := e A ; by hypothesis, A < and A G ]0, 1[. On the 
event {||x K t|| > 2R}, we have 

e \\x Kt \\ 

cxp(\\A K x Kt + K(A, B)u~- t + K(A, I d )w^\\ - \\x Kt ||) x Kt 

e l|A»*» t - B at l r m «(A"»„t)||-||A-x»,|| E [- e ||W(A,/ <i )™^|| | ^J smce A ig ortll0 gonal 
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exp 



K(A,I d ) 



( w 



\Wk-1 



since (w s ) se N is i-i-d. 



-£/„a*+P 



= A < 1 by hypothesis. 
Since t is arbitrary, the first claim follows. By hypothesis, the control actions 



* Kt " K (t+] i) depend only on x K t for each t G No- Thus, the process 

(x Kt )t£N is Markovian under the control actions in (3.8), as can be seen by di- 
rectly verifying (2.7). By Proposition 1 we see that for each x G M. d the sequence 
(E a ,[e" a: '**"]) N is bounded. It remains to move from the K-subsampled process 
(£ re t)tGNo to the original process (xt)tEN a - But standard arguments, along the 
lines of the proof of the main theorem in [16], employing the triangle inequality 
and monotonicity of the function e'') shows that the sequence (E^ [e" 2 '"]) „ is 
bounded. 

For x G M. d let C = C(x) > be such that sup teNo E^ [ell x dl] <^ C. Then for all 

teN , 

f\\-n\\ 

/ e r dr 



" + 00 



1 



1 



o 



{IM>r} 



e r d7 



1 



/•-f-oo 

= / Pa;(||a;t|| > r)e r dr + 1 by Fubini's theorem. 
Jo 

This shows that P^dla^l! > r) must decay, for large values of r, faster than c~' r 
uniformly for all t G No, and the assertion follows. □ 



Appendix B. 



This appendix contains our proof of Theorem 11. For two policies Tro^—i and 

7r . fe _ 1 of length k\ and ki, respectively, we define their concatenation 



7T0:fci-ltt7r0:fc 2 -l : = (^0) ■ ■ ■ ,^ky-W^Q, 



f"fc 2 -U 



Proof of Theorem 11. (11. i): We adapt certain ideas from [3] to the context of long- 
run expected average cost. Suppose that Assumption (A2) holds, and fix n G N. 
Conditional on x n +\ = x' € R d , by definition of optimality, 



p^t.N-JS 



L l=n+1 

Conditional on x n = y, therefore, 

sn+N-1 



n+N 

22 c ( x t,u ( ) +c F (x n+N+ i) x n+ 

l=n+l 

n+N 

22 c(xe,ue) + c F (x n+N+ i) x n+] 



pr5fauN-i$3 



^ c{xi, ue) + c F (x n+N ) J - c(x n ,u n ) 



X n = y 



EtSK:JV-i»S \ CF (x n+N+ i) - c F (x n+N ) + c{x n+N , U n+N ) 



x n = y 
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■> CTTotJTTOiJV-l 

which implies 
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n+N 



^ c{xi,ui) + C F (x n+N +l) 



l=n+\ 



X n = V 



E^^jv-iBs 



r n+JV-l 



^ c(xj , u j ) + c F {x n+N ) 



x n = y 



+ E 7 ^"-* I c F (x n +N+i) - c F (x n+N ) + c(x n+N ,u n+N ) 



x n = y 



^c(y,Tr*(y)) + E<t<»^ 



n+N 



^ c(Xi, Ui) + C F (x n +N+l)) 



=n+l 



x n = y 



Rearranging terms, conditional on x n — y, we get 
c(y,7T*(y)) < V£(y) - E< [VJ*(x„+i) I z„ = y] 



£ n 



C(x n +N,g(x n +N)) ~ C F (x n +N) 

+ E[cf ° f(Xn+N,g(Xn+N),W n +N) \ X n+N ] 



x n = y 



Suppose now that the receding horizon policy n is applied. Since the closed-loop 
process (xt)tew under 7r is Markovian, taking expectations under the policy n, we 
get 

E:[c(x n ,n*(x n ))] < EZ[V N (x n )] - E*[V£(z n+1 )] + E*[E ff * [T- g (x n+N ) \ x n ]], 

whence, summing from n = through n = k we arrive at 

k -, k 



E 71 



J2o(xn,n* (x n )) ^V N (x)-Ei[V N (x k+1 )]+J2 E i[ En *[ T s(xn+N)\x n ]], 



L n=0 

as asserted. 



n=0 



(11. ii) Suppose that Assumption (A3) holds, and let g — g. Fix x G M d . From 
(4.3), it follows that 



fc + 1 



E 7 ! 



^2c(x n ,TTg(x n )) 
n=0 

s^(v£G«0-E*[y$( 



Xk+lj 



k 
1 -—Y^El[E^[T a {x n+N )\x n \] 



n=0 



< -fc^T (Viv (*) - E * [ V£ ( Xfc +i )] ) + fe ^ Assumption (A3) , 
which implies that 



1 



lim sup E J 

fc-> + oo fc+1 



^(x)-Eg[l^(x fc+1 )] 
^ hm sup — + o 



^2c(x n ,TTQ(x n )) 
-n=0 J 

= 6 by hypothesis 
The final claim follows at once from the preceding. 



□ 
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